Scheduling and Optimization of Fault-Tolerant Distributed Embedded Systems
نویسنده
چکیده
Safety-critical applications have to function correctly even in presence of faults. This thesis deals with techniques for tolerating effects of transient and intermittent faults. Reexecution, software replication, and rollback recovery with checkpointing are used to provide the required level of fault tolerance. These techniques are considered in the context of distributed real-time systems with non-preemptive static cyclic scheduling. Safety-critical applications have strict time and cost constrains, which means that not only faults have to be tolerated but also the constraints should be satisfied. Hence, efficient system design approaches with consideration of fault tolerance are required. The thesis proposes several design optimization strategies and scheduling techniques that take fault tolerance into account. The design optimization tasks addressed include, among others, process mapping, fault tolerance policy assignment, and checkpoint distribution. Dedicated scheduling techniques and mapping optimization strategies are also proposed to handle customized transparency requirements associated with processes and messages. By providing fault containment, transparency can, potentially, improve testability and debugability of fault-tolerant applications. The efficiency of the proposed scheduling techniques and design optimization strategies is evaluated with extensive experiments conducted on a number of synthetic applications and a real-life example. The experimental results show that considering fault tolerance during system-level design optimization is essential when designing cost-effective fault-tolerant embedded systems. This work has been partially supported by the National Graduate School in Computer Science (CUGS) of Sweden.
منابع مشابه
Reliability-Driven Fault Tolerant Scheduling Heuristics for Distributed Embedded Real-Time Systems
We present a new scheduling approach to produce automatically a fault tolerant distributed schedule for critical distributed and real-time embedded systems. The approach that we propose take as input a set of operations (tasks), a target distributed architecture, some distribution constraints, some indications on the execution times of the operations on the processors of the target architecture...
متن کاملFault-Tolerant Static Scheduling for Real-Time Distributed Embedded Systems
We present in this paper a heuristic for producing automatically a distributed fault-tolerant schedule of a given data-flow algorithm onto a given distributed architecture. The faults considered are processor failures, with a failsilent behavior. Fault-tolerance is achieved with the software redundancy of computations and the time redundancy of data-dependencies.
متن کاملScheduling Simulation in a Distributed Wireless Embedded System
The aims of the research are to develop a distributed simulation environment and to investigate techniques that support efficient task scheduling algorithms in fault-tolerant, real-time, distributed, and wireless embedded systems. Techniques we developed include deadline-based real-time scheduling, priority-based scheduling, redundant resource allocation for fault-tolerance, energy-aware, and s...
متن کاملTowards a Contract-based Fault-tolerant Scheduling Framework for Distributed Real-time Systems
The increasing complexity of real-time systems has lead to the adaptation of component based methods for their development which has a promising potential for faster and more cost effective development of complex real-time systems by facilitating reuse of the real-time components. This is enabled by the components’ composition using contracts, which ensures ’correctness by construction’. Modern...
متن کاملOnline Adaptive Fault Tolerant based Feedback Control Scheduling Algorithm for Multiprocessor Embedded Systems
Since some years ago, use of Feedback Control Scheduling Algorithm (FCSA) in the control scheduling codesign of multiprocessor embedded system has increased. FCSA provides Quality of Service (QoS) in terms of overall system performance and resource allocation in open and unpredictable environment. FCSA uses quality control feedback loop to keep CPU utilization under desired unitization bound by...
متن کامل